Challenges of the Email Domain for Text Classification
نویسندگان
چکیده
Interactive classi cation of email into a userde ned hierarchy of folders is a natural domain for application of text classi cation methods. This domain presents several challenges. First, the user's changing mailling habits mandate classi cation technology adapt in a dynamic environment. Second, the classi cation technology needs to be able to handle heterogeneity in folder content and folder size. Performance when there are only a small number of messages in a folder is especially important. Third, methods must meet the processing and memory requirements of a software implementation. We study three promising methods and present an analysis of their behavior with respect to these domain-speci c challenges.
منابع مشابه
Topic Modeling and Classification of Cyberspace Papers Using Text Mining
The global cyberspace networks provide individuals with platforms to can interact, exchange ideas, share information, provide social support, conduct business, create artistic media, play games, engage in political discussions, and many more. The term cyberspace has become a conventional means to describe anything associated with the Internet and the diverse Internet culture. In fact, cyberspac...
متن کاملارائه مدلی برای استخراج اطلاعات از مستندات متنی، مبتنی بر متنکاوی در حوزه یادگیری الکترونیکی
As computer networks become the backbones of science and economy, enormous quantities documents become available. So, for extracting useful information from textual data, text mining techniques have been used. Text Mining has become an important research area that discoveries unknown information, facts or new hypotheses by automatically extracting information from different written documents. T...
متن کاملخوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملThree Approaches to Understanding and Classifying Mental Disorder: ICD-11, DSM-5, and the National Institute of Mental Health’s Research Domain Criteria (RDoC)
The classification of mental disorders has long been the subject of controversy among mental health professionals. Despite a Significant expansion of knowledge about mental disorders during the past half century, understanding of their processes and components remains rudimentary. This article provides descriptions of three systems with different purposes relevant to understanding and classifyi...
متن کاملارائه روشی برای استخراج کلمات کلیدی و وزندهی کلمات برای بهبود طبقهبندی متون فارسی
Due to ever-increasing information expansion and existing huge amount of unstructured documents, usage of keywords plays a very important role in information retrieval. Because of a manually-extraction of keywords faces various challenges, their automated extraction seems inevitable. In this research, it has been tried to use a thesaurus, (a structured word-net) to automatically extract them. A...
متن کامل